Data Validation
Heliconia Demographic Survey Data
We use the R package pointblank
to review and validate the plot-level descriptors
(HDP_plots.csv) and clean demographic data set
(heliconia_survey_clean.csv) in preparation for archiving
in Dryad and publication in Bruna et al. (2023). The report below
includes:
- the different validation tests that were conducted,
- the date of the most recent test,
- each test’s criteria for ‘pass’, ‘warn’ and ‘stop’,
- the number of ‘units’ (i.e., rows or columns) assessed in each test,
- how many of these units passed or failed, and
- a button for downloading a .csv file of the records flagged by a particular validation test. Note that these are not necessarily errors. For instance, the validation procedure for ‘plant size - height’ returns as ‘stop’ all plants >2 m tall. Heliconia plants can exceed this threshold; this test is simply designed to flag any such individuals. In contrast, the data set should not have any duplicated rows. A notification of ‘fail’ for this test indicates an error that can be corrected by downloading the csv file, reviewing the duplicated rows, and uploading the necessary corrections.
Dataset Structure: Data types
Tests to determine if columns are correctly coded as integer,
character, etc.
Test criteria: Strict (‘stop’ if any rows
fail).
| Pointblank Validation | |||||||||||||
| Data Validation
tibbleWARN
1
STOP
0.02
NOTIFY
—
|
|||||||||||||
| STEP | COLUMNS | VALUES | TBL | EVAL | UNITS | PASS | FAIL | W | S | N | EXT | ||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | Height is measured to nearest cm
|
— |
|
✓ |
57K |
57K1.00 |
00.00 |
— |
○ |
— |
— | ||
| 2 | Shoots is interger
|
— |
|
✓ |
57K |
57K1.00 |
00.00 |
— |
○ |
— |
— | ||
| 3 | Number of inflorescences is integer
|
— |
|
✓ |
2K |
2K1.00 |
00.00 |
— |
○ |
— |
— | ||
| 2023-05-31 18:38:36 EDT < 1 s 2023-05-31 18:38:36 EDT | |||||||||||||
Dataset Structure: Plot & Subplot IDs
Test for any nonexistent values of plot_id (e.g.,
‘FF-10’, ‘CF-23’) or subplot (e.g., ‘H23’, ‘A11’).
Test criteria: Strict (‘stop’ if any rows
fail).
| Pointblank Validation | |||||||||||||
| Data Validation
tibbleWARN
1
STOP
0.02
NOTIFY
—
|
|||||||||||||
| STEP | COLUMNS | VALUES | TBL | EVAL | UNITS | PASS | FAIL | W | S | N | EXT | ||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | col_vals_in_set()
|
|
✓ |
67K |
67K1.00 |
00.00 |
— |
○ |
— |
— | |||
| 2 | col_vals_in_set()
|
|
✓ |
67K |
67K1.00 |
00.00 |
— |
○ |
— |
— | |||
| 2023-05-31 18:38:37 EDT < 1 s 2023-05-31 18:38:38 EDT | |||||||||||||
Dataset Structure: Duplicated or Missing Values
Tests for duplicated rows, missing plant_ID numbers, or
duplicate plant_id numbers (test is done for every survey
year).
Test criteria: Strict (‘stop’ if any rows
fail).
| Pointblank Validation | |||||||||||||
| Data Validation
tibbleWARN
1
STOP
0.02
NOTIFY
—
|
|||||||||||||
| STEP | COLUMNS | VALUES | TBL | EVAL | UNITS | PASS | FAIL | W | S | N | EXT | ||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | duplicated rows
|
— | — |
|
✓ |
67K |
67K1.00 |
00.00 |
— |
○ |
— |
— | |
| 2 | col_vals_not_null()
|
— |
|
✓ |
67K |
67K1.00 |
00.00 |
— |
○ |
— |
— | ||
| 3 | Check for duplicate plant ID numbers
|
— |
|
✓ |
9K |
9K1.00 |
00.00 |
— |
○ |
— |
— | ||
| 4 | Check for duplicate tag numbers in a plot
|
— |
|
✓ |
64 |
00.00 |
641.00 |
— |
● |
— |
|||
| 2023-05-31 18:38:38 EDT 5.1 s 2023-05-31 18:38:43 EDT | |||||||||||||
Plant Characteristics: Size & Flowering
Tests to determine how many values of plant size (shts,
ht) or infloresence number (infl) are outside
the range of most values.
Test criteria: ‘warn’ if \(\geq\) 1 rows fail conditions, ‘stop’ if
\(\geq\) 2% of rows fail
conditions.
| Pointblank Validation | |||||||||||||
| Data Validation
tibbleWARN
1
STOP
0.02
NOTIFY
—
|
|||||||||||||
| STEP | COLUMNS | VALUES | TBL | EVAL | UNITS | PASS | FAIL | W | S | N | EXT | ||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | shoots between 0 and 20
|
|
✓ |
67K |
67K0.99 |
80.01 |
● |
○ |
— |
||||
| 2 | height between 0 and 200cm
|
|
✓ |
67K |
67K0.99 |
20.01 |
● |
○ |
— |
||||
| 3 | infloresences between 0 and 3
|
|
✓ |
67K |
67K0.99 |
150.01 |
● |
○ |
— |
||||
| 2023-05-31 18:38:44 EDT < 1 s 2023-05-31 18:38:44 EDT | |||||||||||||
Plant Characteristics: Growth
Tests for unusual changes in plant size (both height and shoot
number) from \(Year_{t}\) to \(Year_{t+1}\).
Test criteria: ‘warn’ if \(\geq\) 1 rows fail conditions, ‘stop’ if
\(\geq\) 2% of rows fail
conditions.
| Pointblank Validation | |||||||||||||
| Check growth & regression
tibbleWARN
1
STOP
0.02
NOTIFY
—
|
|||||||||||||
| STEP | COLUMNS | VALUES | TBL | EVAL | UNITS | PASS | FAIL | W | S | N | EXT | ||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | |% change in height| < 200%
|
|
✓ |
67K |
66K0.99 |
4200.01 |
● |
○ |
— |
||||
| 2 | |∆ height| < 100cm
|
|
✓ |
67K |
67K0.99 |
110.01 |
— |
● |
— |
||||
| 3 | |∆ shoot number| < 5
|
|
✓ |
67K |
67K0.99 |
2010.01 |
— |
● |
— |
||||
| 2023-05-31 18:38:45 EDT < 1 s 2023-05-31 18:38:46 EDT | |||||||||||||
Seedlings: Initial size
Tests for seedlings whose size at initial marking was unusually
large. Conducted for both height and shoot number.
Test criteria: ‘warn’ if \(\geq\) 1 rows fail conditions, ‘stop’ if
\(\geq\) 2% of rows fail
conditions.
| Pointblank Validation | |||||||||||||
| Check seedlings
tibbleWARN
1
STOP
0.02
NOTIFY
—
|
|||||||||||||
| STEP | COLUMNS | VALUES | TBL | EVAL | UNITS | PASS | FAIL | W | S | N | EXT | ||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | shoots < 3
|
|
✓ |
3K |
3K0.99 |
120.01 |
● |
○ |
— |
||||
| 2 | height < 30cm
|
|
✓ |
3K |
3K0.99 |
30.01 |
● |
○ |
— |
||||
| 2023-05-31 18:38:47 EDT < 1 s 2023-05-31 18:38:47 EDT | |||||||||||||
Missing values: Height
Graphical depiction of the proportion of plants in each demographic plot for which there is no measurement of plant height (e.g., if plant not found).
Zombie plants
Zombie plants are those that were recorded as ‘Dead’ in a survey but
for which there is a measurement in a subsequent year (indicative of the
plant losing all below-ground parts and then new shoots emerging prior
to the next survey). This validation generates a .csv of
any plants meeting this condition (labeled as ’zombie` for review and
correction.
| Pointblank Validation | |||||||||||||
| Check for zombies
tibbleWARN
1
STOP
0.02
NOTIFY
—
|
|||||||||||||
| STEP | COLUMNS | VALUES | TBL | EVAL | UNITS | PASS | FAIL | W | S | N | EXT | ||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | Check for Zombies
|
|
✓ |
0 |
0NA |
0NA |
— |
○ |
— |
— | |||
| 2023-05-31 18:38:50 EDT < 1 s 2023-05-31 18:38:50 EDT | |||||||||||||